Latent Semantic Analysis for Russian Literature Investigation
نویسنده
چکیده
The paper presents the results of experiments of usage of Latent Semantic Analysis for analysis of textual data. The method is explained in brief and special attention is pointed on its potential for comparison and investigation of Russian literature texts. Two hypotheses are tested: • The texts by the same author are alike and can be distinguished from the ones by different person; • The prose and poetry can be automatically discovered.
منابع مشابه
Latent Semantic Analysis for German Literature Investigation
The paper presents the results of experiments of usage of LSA for analysis of textual data. The method is explained in brief and special attention is pointed on its potential for comparison and investigation of German literature texts. Two hypotheses are tested: 1) the texts by the same author are alike and can be distinguished from the ones by different person; 2) the prose and poetry can be a...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملUnsupervised Topic Modeling for Short Texts Using Distributed Representations of Words
We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic l...
متن کاملPreliminary Experiments on Literature Based Discovery using the Semantic Vectors Package
This paper presents a literature based discovery (LBD) implementation that uses Lucene for indexing, the Semantic Vectors (SV) package for latent semantic analysis, Neo4j for graph database storage, Gephi for visual representation along with custom code written by the author. The approach of using a latent semantic analysis based systems like SV to do LBD is not new, but going the next steps of...
متن کاملContextual analysis in word-for-word MT
EXPERIMENTS with word-for-word MT of Russian scientific literature have given results which, except for such limited purposes as indexing, are far from satisfactory. The difficulty is not so much one of word order as of syntactic and semantic ambiguity of individual words. Regardless of the treatment of the problem of inflected forms, for example, it is impossible in the majority of instances t...
متن کامل